-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes in functions for generation and an example of vertical plot with additional charts #182
base: master
Are you sure you want to change the base?
Conversation
…nerate more than one feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you feel like extending this to close #52 at the same time? (I.e. add some tests for these functions)
And thank you for contributing! |
Thank you for the library you have done the heavy lifting in this point.
I am not used to writing automatic tests on python But I could sure try to
do it.
Sorry for the long delay on the example but as you know it has been a long
and hard time the last year.
…On Mon, Feb 28, 2022 at 1:21 AM Joel Nothman ***@***.***> wrote:
And thank you for contributing!
—
Reply to this email directly, view it on GitHub
<#182 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWRIKRPIOXE45HCPFU7PO3U5K52PANCNFSM5PIGFHXA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Already added the unitary tests for those two functions. You should review them but I think that you can close issue #52 |
I have made some additional changes, but some examples failed due to their usage of the property name which, by the way, can cause like in this case problems when we change from a single column view to a multi-column result |
With luck, I'll have time to look at this next week, @ennanco |
The failures currently in CI pertain to our continued support for very old version of Python (time to drop them, I think), and PEP8 violations in your contribution. Here are those PEP8 issues:
I could just adopt black to make resolving these simpler... |
Thanks for your efforts here! Can we make a point of keeping the existing behaviour of |
Hi Joel,
I think that the proposed name would be suitable in order to keep it as
retrocompatible as possible.
Sincerely,
…-----
Quique
On Sat, Mar 19, 2022 at 1:46 PM Joel Nothman ***@***.***> wrote:
Thanks for your efforts here! Can we make a point of keeping the existing
behaviour of generate_counts stable and backwards compatible? That is,
unless extra columns are requested, the user should not have to go .value.
Should we call the new generation parameter extra_columns?
—
Reply to this email directly, view it on GitHub
<#182 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWRIKUTHVVX3EROVNQUYGTVAXEDRANCNFSM5PIGFHXA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I have finally make the proposed changes with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the same in generate_samples
? Could you please update the docstring? Thanks
…consistency to use extra_columns in order to keep retrocompatibility
Yes, I thought about it, however, I don't want to change it without your indication. Sorry about the docstring, I have already updated it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally reviewing this! It's looking pretty good, but I'm wondering about:
- the use of the name
value{i}
, or whether there's a distinctly better column name - whether we should consider generating values in a way that varies with the categories, such that the visualisations show covariance
Co-authored-by: Joel Nothman <joeln@canva.com>
Co-authored-by: Joel Nothman <joeln@canva.com>
Co-authored-by: Joel Nothman <joeln@canva.com>
Co-authored-by: Joel Nothman <joeln@canva.com>
Co-authored-by: Joel Nothman <joeln@canva.com>
df = pd.DataFrame({'value': np.zeros(n_samples)}) | ||
len_samples = 1 + extra_columns | ||
df = pd.DataFrame(np.zeros((n_samples, len_samples))) | ||
valuename_lst = [f'value{i}' if i > 0 else 'value' for i in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just call this variable columns
or column_names
?
extra_columns=extra_columns) | ||
df.drop('index', axis=1, inplace=True) | ||
df = df if extra_columns > 0 else df.value | ||
return df.groupby(level=list(range(n_categories))).count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think counting is meaningful for the extra columns. Maybe we should use a different aggregate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we shouldn't offer this functionality in generate_counts
, making things somewhat simpler.
r = rng.rand(n_samples, len_samples) | ||
df[f'cat{i}'] = r[:, 0] > rng.rand() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This puzzles me. We're only using the first column of a random matrix of values, and extra_columns
is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about making the values correlate with the categories. Just put in the docstring that the extra column values may change in a future version so we have licence to do it later.
######################################################################### | ||
# An UpSetplot with additional plots on vertical | ||
# and tuning some visual parameters | ||
example = generate_counts(extra_columns=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using generate_samples here makes more sense? But maybe 10k samples is a lot for three swam plots.
I have slightly modified the functions generate_samples and generate_counts to get the data to develop a vertical Upsetplot with additional charts attached to it.